Heart Disease Profiling Report

Overview

Dataset statistics

Number of variables 14
Number of observations 303
Missing cells 0
Missing cells (%) 0.0%
Duplicate rows 1
Duplicate rows (%) 0.3%
Total size in memory 33.3 KiB
Average record size in memory 112.4 B

Variable types

Numeric 5
Categorical 9

Alerts

Dataset has 1 (0.3%) duplicate rows Duplicates
oldpeak is highly correlated with slope High correlation
slope is highly correlated with oldpeak High correlation
oldpeak is highly correlated with slope High correlation
slope is highly correlated with oldpeak High correlation
oldpeak is highly correlated with slope High correlation
slope is highly correlated with oldpeak High correlation
thal is highly correlated with target High correlation
target is highly correlated with thal and 1 other fields High correlation
cp is highly correlated with target High correlation
age is highly correlated with thalach High correlation
sex is highly correlated with thal High correlation
cp is highly correlated with exang and 1 other fields High correlation
restecg is highly correlated with oldpeak High correlation
thalach is highly correlated with age and 2 other fields High correlation
exang is highly correlated with cp and 2 other fields High correlation
oldpeak is highly correlated with restecg and 1 other fields High correlation
slope is highly correlated with oldpeak High correlation
thal is highly correlated with sex and 1 other fields High correlation
target is highly correlated with cp and 3 other fields High correlation
oldpeak has 99 (32.7%) zeros Zeros

Reproduction

Analysis started 2023-04-06 11:15:08.282257
Analysis finished 2023-04-06 11:15:18.111835
Duration 9.83 seconds
Software version pandas-profiling v3.2.0
Download configuration config.json

Variables

age
Real number (ℝ≥0)

HIGH CORRELATION

Distinct 41
Distinct (%) 13.5%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Mean 54.36633663
Minimum 29
Maximum 77
Zeros 0
Zeros (%) 0.0%
Negative 0
Negative (%) 0.0%
Memory size 2.5 KiB
2023-04-06T11:15:18.283476 image/svg+xml Matplotlib v3.7.1, https://matplotlib.org/

Quantile statistics

Minimum 29
5-th percentile 39.1
Q1 47.5
median 55
Q3 61
95-th percentile 68
Maximum 77
Range 48
Interquartile range (IQR) 13.5

Descriptive statistics

Standard deviation 9.08210099
Coefficient of variation (CV) 0.1670537607
Kurtosis -0.542167141
Mean 54.36633663
Median Absolute Deviation (MAD) 7
Skewness -0.2024633655
Sum 16473
Variance 82.48455839
Monotonicity Not monotonic
2023-04-06T11:15:18.559459 image/svg+xml Matplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=41)
Value Count Frequency (%)
58 19
 
6.3%
57 17
 
5.6%
54 16
 
5.3%
59 14
 
4.6%
52 13
 
4.3%
51 12
 
4.0%
62 11
 
3.6%
60 11
 
3.6%
44 11
 
3.6%
56 11
 
3.6%
Other values (31) 168
55.4%
Value Count Frequency (%)
29 1
 
0.3%
34 2
 
0.7%
35 4
 
1.3%
37 2
 
0.7%
38 3
 
1.0%
39 4
 
1.3%
40 3
 
1.0%
41 10
3.3%
42 8
2.6%
43 8
2.6%
Value Count Frequency (%)
77 1
 
0.3%
76 1
 
0.3%
74 1
 
0.3%
71 3
 
1.0%
70 4
1.3%
69 3
 
1.0%
68 4
1.3%
67 9
3.0%
66 7
2.3%
65 8
2.6%

sex
Categorical

HIGH CORRELATION

Distinct 2
Distinct (%) 0.7%
Missing 0
Missing (%) 0.0%
Memory size 2.5 KiB
1
207 
0
96 

Length

Max length 1
Median length 1
Mean length 1
Min length 1

Characters and Unicode

Total characters 303
Distinct characters 2
Distinct categories 1 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 0 ?
Unique (%) 0.0%

Sample

1st row 1
2nd row 1
3rd row 0
4th row 1
5th row 0

Common Values

Value Count Frequency (%)
1 207
68.3%
0 96
31.7%

Length

2023-04-06T11:15:18.820858 image/svg+xml Matplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2023-04-06T11:15:19.073321 image/svg+xml Matplotlib v3.7.1, https://matplotlib.org/
Value Count Frequency (%)
1 207
68.3%
0 96
31.7%

Most occurring characters

Value Count Frequency (%)
1 207
68.3%
0 96
31.7%

Most occurring categories

Value Count Frequency (%)
Decimal Number 303
100.0%

Most frequent character per category

Decimal Number
Value Count Frequency (%)
1 207
68.3%
0 96
31.7%

Most occurring scripts

Value Count Frequency (%)
Common 303
100.0%

Most frequent character per script

Common
Value Count Frequency (%)
1 207
68.3%
0 96
31.7%

Most occurring blocks

Value Count Frequency (%)
ASCII 303
100.0%

Most frequent character per block

ASCII
Value Count Frequency (%)
1 207
68.3%
0 96
31.7%

cp
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct 4
Distinct (%) 1.3%
Missing 0
Missing (%) 0.0%
Memory size 2.5 KiB
0
143 
2
87 
1
50 
3
23 

Length

Max length 1
Median length 1
Mean length 1
Min length 1

Characters and Unicode

Total characters 303
Distinct characters 4
Distinct categories 1 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 0 ?
Unique (%) 0.0%

Sample

1st row 3
2nd row 2
3rd row 1
4th row 1
5th row 0

Common Values

Value Count Frequency (%)
0 143
47.2%
2 87
28.7%
1 50
 
16.5%
3 23
 
7.6%

Length

2023-04-06T11:15:19.335852 image/svg+xml Matplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2023-04-06T11:15:19.617242 image/svg+xml Matplotlib v3.7.1, https://matplotlib.org/
Value Count Frequency (%)
0 143
47.2%
2 87
28.7%
1 50
 
16.5%
3 23
 
7.6%

Most occurring characters

Value Count Frequency (%)
0 143
47.2%
2 87
28.7%
1 50
 
16.5%
3 23
 
7.6%

Most occurring categories

Value Count Frequency (%)
Decimal Number 303
100.0%

Most frequent character per category

Decimal Number
Value Count Frequency (%)
0 143
47.2%
2 87
28.7%
1 50
 
16.5%
3 23
 
7.6%

Most occurring scripts

Value Count Frequency (%)
Common 303
100.0%

Most frequent character per script

Common
Value Count Frequency (%)
0 143
47.2%
2 87
28.7%
1 50
 
16.5%
3 23
 
7.6%

Most occurring blocks

Value Count Frequency (%)
ASCII 303
100.0%

Most frequent character per block

ASCII
Value Count Frequency (%)
0 143
47.2%
2 87
28.7%
1 50
 
16.5%
3 23
 
7.6%

trestbps
Real number (ℝ≥0)

Distinct 49
Distinct (%) 16.2%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Mean 131.6237624
Minimum 94
Maximum 200
Zeros 0
Zeros (%) 0.0%
Negative 0
Negative (%) 0.0%
Memory size 2.5 KiB
2023-04-06T11:15:19.874645 image/svg+xml Matplotlib v3.7.1, https://matplotlib.org/

Quantile statistics

Minimum 94
5-th percentile 108
Q1 120
median 130
Q3 140
95-th percentile 160
Maximum 200
Range 106
Interquartile range (IQR) 20

Descriptive statistics

Standard deviation 17.53814281
Coefficient of variation (CV) 0.1332445031
Kurtosis 0.9290540528
Mean 131.6237624
Median Absolute Deviation (MAD) 10
Skewness 0.7137684379
Sum 39882
Variance 307.5864533
Monotonicity Not monotonic
2023-04-06T11:15:20.196663 image/svg+xml Matplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=49)
Value Count Frequency (%)
120 37
 
12.2%
130 36
 
11.9%
140 32
 
10.6%
110 19
 
6.3%
150 17
 
5.6%
138 13
 
4.3%
128 12
 
4.0%
160 11
 
3.6%
125 11
 
3.6%
112 9
 
3.0%
Other values (39) 106
35.0%
Value Count Frequency (%)
94 2
 
0.7%
100 4
 
1.3%
101 1
 
0.3%
102 2
 
0.7%
104 1
 
0.3%
105 3
 
1.0%
106 1
 
0.3%
108 6
 
2.0%
110 19
6.3%
112 9
3.0%
Value Count Frequency (%)
200 1
 
0.3%
192 1
 
0.3%
180 3
 
1.0%
178 2
 
0.7%
174 1
 
0.3%
172 1
 
0.3%
170 4
 
1.3%
165 1
 
0.3%
164 1
 
0.3%
160 11
3.6%

chol
Real number (ℝ≥0)

Distinct 152
Distinct (%) 50.2%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Mean 246.2640264
Minimum 126
Maximum 564
Zeros 0
Zeros (%) 0.0%
Negative 0
Negative (%) 0.0%
Memory size 2.5 KiB
2023-04-06T11:15:20.765938 image/svg+xml Matplotlib v3.7.1, https://matplotlib.org/

Quantile statistics

Minimum 126
5-th percentile 175
Q1 211
median 240
Q3 274.5
95-th percentile 326.9
Maximum 564
Range 438
Interquartile range (IQR) 63.5

Descriptive statistics

Standard deviation 51.83075099
Coefficient of variation (CV) 0.2104682188
Kurtosis 4.505423168
Mean 246.2640264
Median Absolute Deviation (MAD) 32
Skewness 1.143400821
Sum 74618
Variance 2686.426748
Monotonicity Not monotonic
2023-04-06T11:15:21.044593 image/svg+xml Matplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
Value Count Frequency (%)
204 6
 
2.0%
197 6
 
2.0%
234 6
 
2.0%
269 5
 
1.7%
254 5
 
1.7%
212 5
 
1.7%
211 4
 
1.3%
240 4
 
1.3%
177 4
 
1.3%
243 4
 
1.3%
Other values (142) 254
83.8%
Value Count Frequency (%)
126 1
0.3%
131 1
0.3%
141 1
0.3%
149 2
0.7%
157 1
0.3%
160 1
0.3%
164 1
0.3%
166 1
0.3%
167 1
0.3%
168 1
0.3%
Value Count Frequency (%)
564 1
0.3%
417 1
0.3%
409 1
0.3%
407 1
0.3%
394 1
0.3%
360 1
0.3%
354 1
0.3%
353 1
0.3%
342 1
0.3%
341 1
0.3%

fbs
Categorical

Distinct 2
Distinct (%) 0.7%
Missing 0
Missing (%) 0.0%
Memory size 2.5 KiB
0
258 
1
45 

Length

Max length 1
Median length 1
Mean length 1
Min length 1

Characters and Unicode

Total characters 303
Distinct characters 2
Distinct categories 1 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 0 ?
Unique (%) 0.0%

Sample

1st row 1
2nd row 0
3rd row 0
4th row 0
5th row 0

Common Values

Value Count Frequency (%)
0 258
85.1%
1 45
 
14.9%

Length

2023-04-06T11:15:21.314856 image/svg+xml Matplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2023-04-06T11:15:21.556690 image/svg+xml Matplotlib v3.7.1, https://matplotlib.org/
Value Count Frequency (%)
0 258
85.1%
1 45
 
14.9%

Most occurring characters

Value Count Frequency (%)
0 258
85.1%
1 45
 
14.9%

Most occurring categories

Value Count Frequency (%)
Decimal Number 303
100.0%

Most frequent character per category

Decimal Number
Value Count Frequency (%)
0 258
85.1%
1 45
 
14.9%

Most occurring scripts

Value Count Frequency (%)
Common 303
100.0%

Most frequent character per script

Common
Value Count Frequency (%)
0 258
85.1%
1 45
 
14.9%

Most occurring blocks

Value Count Frequency (%)
ASCII 303
100.0%

Most frequent character per block

ASCII
Value Count Frequency (%)
0 258
85.1%
1 45
 
14.9%

restecg
Categorical

HIGH CORRELATION

Distinct 3
Distinct (%) 1.0%
Missing 0
Missing (%) 0.0%
Memory size 2.5 KiB
1
152 
0
147 
2
 
4

Length

Max length 1
Median length 1
Mean length 1
Min length 1

Characters and Unicode

Total characters 303
Distinct characters 3
Distinct categories 1 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 0 ?
Unique (%) 0.0%

Sample

1st row 0
2nd row 1
3rd row 0
4th row 1
5th row 1

Common Values

Value Count Frequency (%)
1 152
50.2%
0 147
48.5%
2 4
 
1.3%

Length

2023-04-06T11:15:21.773935 image/svg+xml Matplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2023-04-06T11:15:22.031709 image/svg+xml Matplotlib v3.7.1, https://matplotlib.org/
Value Count Frequency (%)
1 152
50.2%
0 147
48.5%
2 4
 
1.3%

Most occurring characters

Value Count Frequency (%)
1 152
50.2%
0 147
48.5%
2 4
 
1.3%

Most occurring categories

Value Count Frequency (%)
Decimal Number 303
100.0%

Most frequent character per category

Decimal Number
Value Count Frequency (%)
1 152
50.2%
0 147
48.5%
2 4
 
1.3%

Most occurring scripts

Value Count Frequency (%)
Common 303
100.0%

Most frequent character per script

Common
Value Count Frequency (%)
1 152
50.2%
0 147
48.5%
2 4
 
1.3%

Most occurring blocks

Value Count Frequency (%)
ASCII 303
100.0%

Most frequent character per block

ASCII
Value Count Frequency (%)
1 152
50.2%
0 147
48.5%
2 4
 
1.3%

thalach
Real number (ℝ≥0)

HIGH CORRELATION

Distinct 91
Distinct (%) 30.0%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Mean 149.6468647
Minimum 71
Maximum 202
Zeros 0
Zeros (%) 0.0%
Negative 0
Negative (%) 0.0%
Memory size 2.5 KiB
2023-04-06T11:15:22.294599 image/svg+xml Matplotlib v3.7.1, https://matplotlib.org/

Quantile statistics

Minimum 71
5-th percentile 108.1
Q1 133.5
median 153
Q3 166
95-th percentile 181.9
Maximum 202
Range 131
Interquartile range (IQR) 32.5

Descriptive statistics

Standard deviation 22.90516111
Coefficient of variation (CV) 0.1530614167
Kurtosis -0.06196993058
Mean 149.6468647
Median Absolute Deviation (MAD) 15
Skewness -0.5374096527
Sum 45343
Variance 524.6464057
Monotonicity Not monotonic
2023-04-06T11:15:22.632136 image/svg+xml Matplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
Value Count Frequency (%)
162 11
 
3.6%
160 9
 
3.0%
163 9
 
3.0%
152 8
 
2.6%
173 8
 
2.6%
125 7
 
2.3%
144 7
 
2.3%
143 7
 
2.3%
150 7
 
2.3%
132 7
 
2.3%
Other values (81) 223
73.6%
Value Count Frequency (%)
71 1
 
0.3%
88 1
 
0.3%
90 1
 
0.3%
95 1
 
0.3%
96 2
0.7%
97 1
 
0.3%
99 1
 
0.3%
103 2
0.7%
105 3
1.0%
106 1
 
0.3%
Value Count Frequency (%)
202 1
0.3%
195 1
0.3%
194 1
0.3%
192 1
0.3%
190 1
0.3%
188 1
0.3%
187 1
0.3%
186 2
0.7%
185 1
0.3%
184 1
0.3%

exang
Categorical

HIGH CORRELATION

Distinct 2
Distinct (%) 0.7%
Missing 0
Missing (%) 0.0%
Memory size 2.5 KiB
0
204 
1
99 

Length

Max length 1
Median length 1
Mean length 1
Min length 1

Characters and Unicode

Total characters 303
Distinct characters 2
Distinct categories 1 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 0 ?
Unique (%) 0.0%

Sample

1st row 0
2nd row 0
3rd row 0
4th row 0
5th row 1

Common Values

Value Count Frequency (%)
0 204
67.3%
1 99
32.7%

Length

2023-04-06T11:15:22.934892 image/svg+xml Matplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2023-04-06T11:15:23.263002 image/svg+xml Matplotlib v3.7.1, https://matplotlib.org/
Value Count Frequency (%)
0 204
67.3%
1 99
32.7%

Most occurring characters

Value Count Frequency (%)
0 204
67.3%
1 99
32.7%

Most occurring categories

Value Count Frequency (%)
Decimal Number 303
100.0%

Most frequent character per category

Decimal Number
Value Count Frequency (%)
0 204
67.3%
1 99
32.7%

Most occurring scripts

Value Count Frequency (%)
Common 303
100.0%

Most frequent character per script

Common
Value Count Frequency (%)
0 204
67.3%
1 99
32.7%

Most occurring blocks

Value Count Frequency (%)
ASCII 303
100.0%

Most frequent character per block

ASCII
Value Count Frequency (%)
0 204
67.3%
1 99
32.7%

oldpeak
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
ZEROS

Distinct 40
Distinct (%) 13.2%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Mean 1.03960396
Minimum 0
Maximum 6.2
Zeros 99
Zeros (%) 32.7%
Negative 0
Negative (%) 0.0%
Memory size 2.5 KiB
2023-04-06T11:15:23.673998 image/svg+xml Matplotlib v3.7.1, https://matplotlib.org/

Quantile statistics

Minimum 0
5-th percentile 0
Q1 0
median 0.8
Q3 1.6
95-th percentile 3.4
Maximum 6.2
Range 6.2
Interquartile range (IQR) 1.6

Descriptive statistics

Standard deviation 1.161075022
Coefficient of variation (CV) 1.116843593
Kurtosis 1.575813073
Mean 1.03960396
Median Absolute Deviation (MAD) 0.8
Skewness 1.269719931
Sum 315
Variance 1.348095207
Monotonicity Not monotonic
2023-04-06T11:15:24.123526 image/svg+xml Matplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=40)
Value Count Frequency (%)
0 99
32.7%
1.2 17
 
5.6%
1 14
 
4.6%
0.6 14
 
4.6%
1.4 13
 
4.3%
0.8 13
 
4.3%
0.2 12
 
4.0%
1.6 11
 
3.6%
1.8 10
 
3.3%
0.4 9
 
3.0%
Other values (30) 91
30.0%
Value Count Frequency (%)
0 99
32.7%
0.1 7
 
2.3%
0.2 12
 
4.0%
0.3 3
 
1.0%
0.4 9
 
3.0%
0.5 5
 
1.7%
0.6 14
 
4.6%
0.7 1
 
0.3%
0.8 13
 
4.3%
0.9 3
 
1.0%
Value Count Frequency (%)
6.2 1
 
0.3%
5.6 1
 
0.3%
4.4 1
 
0.3%
4.2 2
0.7%
4 3
1.0%
3.8 1
 
0.3%
3.6 4
1.3%
3.5 1
 
0.3%
3.4 3
1.0%
3.2 2
0.7%

slope
Categorical

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct 3
Distinct (%) 1.0%
Missing 0
Missing (%) 0.0%
Memory size 2.5 KiB
2
142 
1
140 
0
21 

Length

Max length 1
Median length 1
Mean length 1
Min length 1

Characters and Unicode

Total characters 303
Distinct characters 3
Distinct categories 1 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 0 ?
Unique (%) 0.0%

Sample

1st row 0
2nd row 0
3rd row 2
4th row 2
5th row 2

Common Values

Value Count Frequency (%)
2 142
46.9%
1 140
46.2%
0 21
 
6.9%

Length

2023-04-06T11:15:24.616325 image/svg+xml Matplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2023-04-06T11:15:25.098419 image/svg+xml Matplotlib v3.7.1, https://matplotlib.org/
Value Count Frequency (%)
2 142
46.9%
1 140
46.2%
0 21
 
6.9%

Most occurring characters

Value Count Frequency (%)
2 142
46.9%
1 140
46.2%
0 21
 
6.9%

Most occurring categories

Value Count Frequency (%)
Decimal Number 303
100.0%

Most frequent character per category

Decimal Number
Value Count Frequency (%)
2 142
46.9%
1 140
46.2%
0 21
 
6.9%

Most occurring scripts

Value Count Frequency (%)
Common 303
100.0%

Most frequent character per script

Common
Value Count Frequency (%)
2 142
46.9%
1 140
46.2%
0 21
 
6.9%

Most occurring blocks

Value Count Frequency (%)
ASCII 303
100.0%

Most frequent character per block

ASCII
Value Count Frequency (%)
2 142
46.9%
1 140
46.2%
0 21
 
6.9%

ca
Categorical

Distinct 5
Distinct (%) 1.7%
Missing 0
Missing (%) 0.0%
Memory size 2.5 KiB
0
175 
1
65 
2
38 
3
20 
4
 
5

Length

Max length 1
Median length 1
Mean length 1
Min length 1

Characters and Unicode

Total characters 303
Distinct characters 5
Distinct categories 1 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 0 ?
Unique (%) 0.0%

Sample

1st row 0
2nd row 0
3rd row 0
4th row 0
5th row 0

Common Values

Value Count Frequency (%)
0 175
57.8%
1 65
 
21.5%
2 38
 
12.5%
3 20
 
6.6%
4 5
 
1.7%

Length

2023-04-06T11:15:25.460030 image/svg+xml Matplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2023-04-06T11:15:25.956296 image/svg+xml Matplotlib v3.7.1, https://matplotlib.org/
Value Count Frequency (%)
0 175
57.8%
1 65
 
21.5%
2 38
 
12.5%
3 20
 
6.6%
4 5
 
1.7%

Most occurring characters

Value Count Frequency (%)
0 175
57.8%
1 65
 
21.5%
2 38
 
12.5%
3 20
 
6.6%
4 5
 
1.7%

Most occurring categories

Value Count Frequency (%)
Decimal Number 303
100.0%

Most frequent character per category

Decimal Number
Value Count Frequency (%)
0 175
57.8%
1 65
 
21.5%
2 38
 
12.5%
3 20
 
6.6%
4 5
 
1.7%

Most occurring scripts

Value Count Frequency (%)
Common 303
100.0%

Most frequent character per script

Common
Value Count Frequency (%)
0 175
57.8%
1 65
 
21.5%
2 38
 
12.5%
3 20
 
6.6%
4 5
 
1.7%

Most occurring blocks

Value Count Frequency (%)
ASCII 303
100.0%

Most frequent character per block

ASCII
Value Count Frequency (%)
0 175
57.8%
1 65
 
21.5%
2 38
 
12.5%
3 20
 
6.6%
4 5
 
1.7%

thal
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct 4
Distinct (%) 1.3%
Missing 0
Missing (%) 0.0%
Memory size 2.5 KiB
2
166 
3
117 
1
18 
0
 
2

Length

Max length 1
Median length 1
Mean length 1
Min length 1

Characters and Unicode

Total characters 303
Distinct characters 4
Distinct categories 1 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 0 ?
Unique (%) 0.0%

Sample

1st row 1
2nd row 2
3rd row 2
4th row 2
5th row 2

Common Values

Value Count Frequency (%)
2 166
54.8%
3 117
38.6%
1 18
 
5.9%
0 2
 
0.7%

Length

2023-04-06T11:15:26.429381 image/svg+xml Matplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2023-04-06T11:15:26.722059 image/svg+xml Matplotlib v3.7.1, https://matplotlib.org/
Value Count Frequency (%)
2 166
54.8%
3 117
38.6%
1 18
 
5.9%
0 2
 
0.7%

Most occurring characters

Value Count Frequency (%)
2 166
54.8%
3 117
38.6%
1 18
 
5.9%
0 2
 
0.7%

Most occurring categories

Value Count Frequency (%)
Decimal Number 303
100.0%

Most frequent character per category

Decimal Number
Value Count Frequency (%)
2 166
54.8%
3 117
38.6%
1 18
 
5.9%
0 2
 
0.7%

Most occurring scripts

Value Count Frequency (%)
Common 303
100.0%

Most frequent character per script

Common
Value Count Frequency (%)
2 166
54.8%
3 117
38.6%
1 18
 
5.9%
0 2
 
0.7%

Most occurring blocks

Value Count Frequency (%)
ASCII 303
100.0%

Most frequent character per block

ASCII
Value Count Frequency (%)
2 166
54.8%
3 117
38.6%
1 18
 
5.9%
0 2
 
0.7%

target
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct 2
Distinct (%) 0.7%
Missing 0
Missing (%) 0.0%
Memory size 2.5 KiB
1
165 
0
138 

Length

Max length 1
Median length 1
Mean length 1
Min length 1

Characters and Unicode

Total characters 303
Distinct characters 2
Distinct categories 1 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 0 ?
Unique (%) 0.0%

Sample

1st row 1
2nd row 1
3rd row 1
4th row 1
5th row 1

Common Values

Value Count Frequency (%)
1 165
54.5%
0 138
45.5%

Length

2023-04-06T11:15:26.941197 image/svg+xml Matplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2023-04-06T11:15:27.193073 image/svg+xml Matplotlib v3.7.1, https://matplotlib.org/
Value Count Frequency (%)
1 165
54.5%
0 138
45.5%

Most occurring characters

Value Count Frequency (%)
1 165
54.5%
0 138
45.5%

Most occurring categories

Value Count Frequency (%)
Decimal Number 303
100.0%

Most frequent character per category

Decimal Number
Value Count Frequency (%)
1 165
54.5%
0 138
45.5%

Most occurring scripts

Value Count Frequency (%)
Common 303
100.0%

Most frequent character per script

Common
Value Count Frequency (%)
1 165
54.5%
0 138
45.5%

Most occurring blocks

Value Count Frequency (%)
ASCII 303
100.0%

Most frequent character per block

ASCII
Value Count Frequency (%)
1 165
54.5%
0 138
45.5%

Interactions

2023-04-06T11:15:15.985603 image/svg+xml Matplotlib v3.7.1, https://matplotlib.org/
2023-04-06T11:15:09.771280 image/svg+xml Matplotlib v3.7.1, https://matplotlib.org/
2023-04-06T11:15:11.514320 image/svg+xml Matplotlib v3.7.1, https://matplotlib.org/
2023-04-06T11:15:13.234937 image/svg+xml Matplotlib v3.7.1, https://matplotlib.org/
2023-04-06T11:15:14.778208 image/svg+xml Matplotlib v3.7.1, https://matplotlib.org/
2023-04-06T11:15:16.211172 image/svg+xml Matplotlib v3.7.1, https://matplotlib.org/
2023-04-06T11:15:10.104517 image/svg+xml Matplotlib v3.7.1, https://matplotlib.org/
2023-04-06T11:15:11.874449 image/svg+xml Matplotlib v3.7.1, https://matplotlib.org/
2023-04-06T11:15:13.887478 image/svg+xml Matplotlib v3.7.1, https://matplotlib.org/
2023-04-06T11:15:15.029437 image/svg+xml Matplotlib v3.7.1, https://matplotlib.org/
2023-04-06T11:15:16.429966 image/svg+xml Matplotlib v3.7.1, https://matplotlib.org/
2023-04-06T11:15:10.475782 image/svg+xml Matplotlib v3.7.1, https://matplotlib.org/
2023-04-06T11:15:12.152691 image/svg+xml Matplotlib v3.7.1, https://matplotlib.org/
2023-04-06T11:15:14.118918 image/svg+xml Matplotlib v3.7.1, https://matplotlib.org/
2023-04-06T11:15:15.272047 image/svg+xml Matplotlib v3.7.1, https://matplotlib.org/
2023-04-06T11:15:16.642209 image/svg+xml Matplotlib v3.7.1, https://matplotlib.org/
2023-04-06T11:15:10.833142 image/svg+xml Matplotlib v3.7.1, https://matplotlib.org/
2023-04-06T11:15:12.473511 image/svg+xml Matplotlib v3.7.1, https://matplotlib.org/
2023-04-06T11:15:14.317308 image/svg+xml Matplotlib v3.7.1, https://matplotlib.org/
2023-04-06T11:15:15.487442 image/svg+xml Matplotlib v3.7.1, https://matplotlib.org/
2023-04-06T11:15:16.879744 image/svg+xml Matplotlib v3.7.1, https://matplotlib.org/
2023-04-06T11:15:11.123054 image/svg+xml Matplotlib v3.7.1, https://matplotlib.org/
2023-04-06T11:15:12.889611 image/svg+xml Matplotlib v3.7.1, https://matplotlib.org/
2023-04-06T11:15:14.541440 image/svg+xml Matplotlib v3.7.1, https://matplotlib.org/
2023-04-06T11:15:15.748551 image/svg+xml Matplotlib v3.7.1, https://matplotlib.org/

Correlations

2023-04-06T11:15:27.378132 image/svg+xml Matplotlib v3.7.1, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2023-04-06T11:15:27.724138 image/svg+xml Matplotlib v3.7.1, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2023-04-06T11:15:28.062547 image/svg+xml Matplotlib v3.7.1, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2023-04-06T11:15:28.398567 image/svg+xml Matplotlib v3.7.1, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.
2023-04-06T11:15:28.723138 image/svg+xml Matplotlib v3.7.1, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2023-04-06T11:15:17.279254 image/svg+xml Matplotlib v3.7.1, https://matplotlib.org/
A simple visualization of nullity by column.
2023-04-06T11:15:17.913246 image/svg+xml Matplotlib v3.7.1, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

First rows

age sex cp trestbps chol fbs restecg thalach exang oldpeak slope ca thal target
0 63 1 3 145 233 1 0 150 0 2.3 0 0 1 1
1 37 1 2 130 250 0 1 187 0 3.5 0 0 2 1
2 41 0 1 130 204 0 0 172 0 1.4 2 0 2 1
3 56 1 1 120 236 0 1 178 0 0.8 2 0 2 1
4 57 0 0 120 354 0 1 163 1 0.6 2 0 2 1
5 57 1 0 140 192 0 1 148 0 0.4 1 0 1 1
6 56 0 1 140 294 0 0 153 0 1.3 1 0 2 1
7 44 1 1 120 263 0 1 173 0 0.0 2 0 3 1
8 52 1 2 172 199 1 1 162 0 0.5 2 0 3 1
9 57 1 2 150 168 0 1 174 0 1.6 2 0 2 1

Last rows

age sex cp trestbps chol fbs restecg thalach exang oldpeak slope ca thal target
293 67 1 2 152 212 0 0 150 0 0.8 1 0 3 0
294 44 1 0 120 169 0 1 144 1 2.8 0 0 1 0
295 63 1 0 140 187 0 0 144 1 4.0 2 2 3 0
296 63 0 0 124 197 0 1 136 1 0.0 1 0 2 0
297 59 1 0 164 176 1 0 90 0 1.0 1 2 1 0
298 57 0 0 140 241 0 1 123 1 0.2 1 0 3 0
299 45 1 3 110 264 0 1 132 0 1.2 1 0 3 0
300 68 1 0 144 193 1 1 141 0 3.4 1 2 3 0
301 57 1 0 130 131 0 1 115 1 1.2 1 1 3 0
302 57 0 1 130 236 0 0 174 0 0.0 1 1 2 0

Duplicate rows

Most frequently occurring

age sex cp trestbps chol fbs restecg thalach exang oldpeak slope ca thal target # duplicates
0 38 1 2 138 175 0 1 173 0 0.0 2 4 2 1 2